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Isotonic regression is the problem of 
fitting data to order constraints. This 
problem can be solved numerically in an 
efficient way by successive projections onto 
order simplex constraints. An algorithm 
for solving the isotonic regression using 
successive projections onto order simplex 
constraints was originally suggested and 
analyzed by Grotzinger and Witzgall. This 
algorithm has been employed repeatedly in 
a wide variety of applications. In this 
paper we briefly discuss the isotonic 
regression problem and its solution by 
the Grotzinger- Witzgall method. We 
demonstrate that this algorithm can be 



appropriately modified to run on a parallel 
computer with substantial speed-up. 
Finally we illustrate how it can be used 
to pre-process mass spectral data for 
automatic high throughput analysis. 
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1. Introduction 

Given a finite set of real numbers, Y= (y^, . . . , y'n}, 
the problem of isotonic regression with respect to a 
complete order is the following quadratic programming 
problem: 

minimize, X/=i^X-^^ ">'/)' n) 

subject to x^ < • 



•<x„ 



where the w^ are strictly positive weights. Many impor- 
tant problems in statistics and other disciplines can be 
posed as isotonic regression problems. In epidemiolo- 
gy, binary longitudinal data are often collected in clini- 
cal trials of chronic diseases when interest is on assess- 
ing the effect of a treatment over time. Transitional 
models for longitudinal binary data subject to non- 
ignorable missing data can be developed but parameter 
estimation must be done in conjunction with an iso- 
tonic regression (see Ref [3]). In classical time series 
analysis applied to global climate prediction, complex 
processes are often modelled as three additive compo- 
nents: long-time trend, seasonal effect and background 
noise. The long-time trend superimposed with the 



seasonal effect constitute the mean part of the process. 
The important issue of mean stationarity is usually the 
first step for statistical inference. Researchers have 
developed a testing and estimation theory for the 
existence of a monotonic trend and the identification of 
important seasonal effects. The associated statistical 
inference and probabilistic diagnostics result from 
solving a problem generically called the change-point 
problem. This change-point problem initially arose in 
quality control assessment. It includes, for example, 
the testing for changes in weather patterns and disease 
rates. Isotonic regression is necessary to test and 
estimate the trend and to determine periodic compo- 
nents (see Ref [15]). For this application the isotonic 
regression yields estimators for the long-time trend 
with negligible influence from the seasonal effect, 
a desirable property. Recently, the development of 
algorithms for automatic peak and trough detection in 
raw mass spectrometer data have become an active 
research area. Motivated by the increased ability to 
produce high-quality data quickly, these algorithms 
have become crucial in cost-effective analysis of data 
(see Ref [12, 13]). One difficulty in applying automat- 
ic structure revealing algorithms to raw spectroscopy 
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data is the failure of data to be monotonic (usually as a 
function of mass or time). This can occur for many 
reasons including truncation and roundoff error, 
machine calibration errors, ...etc. In this case, isotonic 
regression has become an important pre-processing 
step before data analysis. 

Clearly the isotonic regression problem is an opti- 
mization problem encountered frequently when dealing 
with data generated with any uncertainty as is the 
case in many application problems in science and 
engineering. In their respective monographs Barlow, 
Bartholomew, Bremner, and Brunk and Robertson, 
Wright, and Dykstra have written comprehensive 
surveys of this subject (see Refs. [1, 9]). 

1.1 A Simple Example 

In this section we describe a simple example of the 
isotonic regression problem. Suppose that {1, 3, 2, 4, 5, 
7, 6, 8} is a given set of real numbers, and that all the 
weights associated with these numbers are all identical- 
ly one. This set is almost isotonic; however, {3,2} and 
{7,6} violate the nondecreasing requirement. One 
simple solution to this difficulty can be constructed by 
replacing each "block" of "violators" with the average 
of the numbers in the block. This produces {1, 2.5, 2.5, 
4, 5, 6.5, 6.5, 8}, which turns out to be the unique 
solution of the isotonic regression problem. This is an 
example of the well-known ''Pool Adjacent Violators'' 
algorithm. One of the best-known examples of a pool 
adjacent violators algorithm for solving the isotonic 
regression problem is due to Grotzinger and Witzgall 
(Ref [4]). In this important paper, the authors propose 
an efficient pool adjacent violator method for identify- 
ing data points that violate the order constraint and, in 
turn, projects the violators onto simplex representations 
of the order constraints. 

In this work, the formulation of the Pool Adjacent 
Violators algorithm due to Grotzinger and Witzgall has 
been altered slightly to run specifically on parallel 
computers. This algorithm has been implemented and 
tested on parallel machines (Ref [7]). 

2. A Decomposition Tlieorem 

To attain the necessary rigor, we exploit a famous 
and very elegant characterization of the solution to 

the isotonic regression problem, ^y = X/=i^7' ^^^^o 

denote the point (0, 0), and let Pj denote the point 

Wj,J^i^wj,)Jorj=\,„,,n. We interpret P, ..,P„ 

as points in the graph of a function, which we extend 



to the interval [0, W„] by linear interpolation. Both the 
function and its graph are called the cumulative sum 
diagram (CSD) of the isotonic regression problem. 
The greatest convex minorant (GCM) of a function/ 
is the convex function defined by 

GCM[f] = sup{<^ : (j) convex, <l>< f) . 

It is a well-known and beautiful result that the iso- 
tonic regression problem is solved by taking x] to 
be the left derivative of GCM [CSD] at Wj . Thus, theo- 
rems about isotonic regressions can be stated and 
proved as theorems about greatest convex minorants. 

The ideas of the work presented here are based on 
convex analysis. A comprehensive survey of convex 
analysis can be found in Ref [10]. Even though only 
elementary tools need be employed to prove this 
theorem, it has profound implications for parallel 
computation. Suppose that we decompose the set Y into 
Y, © 72, where Y, = {y,,... , jj and Y^ ={>',+ 1, .. . ,%}. 
Analogously, we can decompose a function / with 
domain [0,^J into/ ©/, where/ is the restriction of 
/to [0,^J and/ is the restriction of/to (Wj,,W,]. Then 
the following result is easily demonstrated. 

Theorem 1 GCM [GCM[fi] ® OCMlfi] ] = OCM[f] 

Proof: 

Since GCM[f{\ </ and GCM[f,] </, 

GCM[f,] © GCMlf,] </ ©/ =/ 

It follows that, if (^ < GCM[fi] ® <^CM|/], 
then (j) </ and hence that 

GCM[GCM\f,] © GCM[f2\ ] < GCM[f\. 

Conversely, suppose that (j) <f is convex and write 

(^ = <^i © <^2. Then (jy^ </ and <^2 ^/^ so (j)^ < GCM[f{\ and 
(^ < GCM[f2\. It follows that (^<GCM[f,] © GCM[f2l 
and hence that 

GCM[f] < GCM[GCM[f,]®GCM[f,]]. (3) 

Combining inequalities (2) and (3) gives the desired 
result, n 

2.1 Implications for Parallel Computation 

If one takes the function /to be the CSD for the iso- 
tonic regression problem, then Theorem 1 states the fol- 
lowing: decomposing 7 into Yy © Y2, performing sepa- 
rate isotonic regressions on Yy and Y2, and then, per- 
forming a final isotonic regression on the combined 
result, produces the isotonic regression on Y. Because 



122 



Volume 1 1 1, Number 2, March-April 2006 

Journal of Research of the National Institute of Standards and Technology 



the separate isotonic regressions on Y^ and Y2 can be 
performed simultaneously, parallel computations of 
isotonic regressions will be desirable if the final iso- 
tonic regression on the combined result is easy to com- 
pute. In point of fact, this is the case. Suppose that Y^ 
satisfies ji < • • • < j^^ and Y2 satisfies yj,+ i <• • • <y^. If 
y},<yi,+ 1, then Y is isotonic. If Y is not isotonic, then it 
must be because some of the largest numbers in Y^ 
exceed some of the smallest numbers in Y2. The anti- 
dote to this difficulty is to identify this central block of 
offending numbers and to replace each of these num- 
bers with the weighted average of the block. (This is 
just the Pool Adjacent Violators algorithm again.) To 
accomplish this, let 

m = mm {i:y. > y,J , 

M = max {i:y. < y^} , 



and 



M 



M 



i=m i=m 



Then, replacing y. with yfoxi-n%,,,,M gives the 
isotonic regression of Y, Thus, if one decomposes the 
isotonic regression problem and performs two smaller, 
separate isotonic regressions, it becomes fairly simple 
to obtain the solution to the original problem. 

By now it should be apparent that what is being 
proposed in this paper is not a new parallel algorithm 
for isotonic regression that will compete with existing 
algorithms. Rather, it is the isotonic regression problem 
itself that has been parallelized. (An instructive analo- 
gy is the familiar exercise of sorting a list of numbers 
by subdividing the list, sorting each sublist, then inter- 
weaving the sorted sublists.) Because the problem itself 
has been parallelized, any isotonic regression algorithm 
can be used to compute the separate isotonic regres- 
sions assigned to separate processors. The efficiency of 
various isotonic regression algorithms has been dis- 
cussed by Best and Chakravarti [2]. A very fast formu- 
lation of the Pool Adjacent Violators algorithm was 
provided by Grotzinger and Witzgall [4]. 

In light of the preceding arguments, we are virtually 
assured that a parallel approach to isotonic regression 
will speed up computation when n is sufficiently large. 
This phenomenon is demonstrated in Sect. 4. Notice, 
however, that we should not expect that the most effi- 
cient strategy will necessarily be the one that uses the 
largest number of processors, since the more that the 
original problem is decomposed, the more difficult it 
becomes to obtain the final solution from the separate 
isotonic regressions. As an extreme example of this 
limitation, one might decompose Y into n subsets of 



singleton values, in which case nothing whatsoever has 
been accomplished. Furthermore, the more that the 
original problem is decomposed, the greater the com- 
munication costs of parallelization. Hence, it is impos- 
sible to anticipate the most efficient decomposition 
strategy. 

3, Application to the Analysis of Mass 
Spectral Data 

Modem mass spectrometers are capable of produc- 
ing large, high-quality data sets in brief periods of time 
(Ref. [14]). It is not uncommon for a synthetic polymer 
to produce a spectra with hundreds of peaks. This moti- 
vates the design of automated data analysis algorithms 
capable of rapid and repeatable processing of raw mass 
spectrometer data. While many algorithms for the 
analysis of raw mass spectrometer already exist, they 
all require significant operator input. In some cases 
smoothing parameters must be selected, in other cases 
one must identify peaks from noise or vice-versa, and 
many algorithms assume the functional form of data 
close to peaks or troughs. Once the data has been 
processed, for example peaks or troughs have been 
selected and the area underneath portions of the data 
have been calculated, there is still no standard or point 
of comparison (Refs. [5, 6]). 

Recently an algorithm with the potential to automat- 
ically identify peak structure from raw mass spectrom- 
eter output without the use of smoothing, parameter 
specific filtering, or manual data analysis has been 
developed (Refs. [12, 8]). This method requires no 
knowledge of peak shape and no pre- or post-process- 
ing of the data. Experience to date on matrix-assisted 
laser desorption/ionisation-time of flight mass spec- 
trometry (MALDI-TOFMS) shows that the power 
spectrum of the noise cannot be predicted solely from 
the experimental conditions; therefore, blind applica- 
tion of smoothing and/or filtering algorithms may unin- 
tentionally remove information from the data. The new 
method does not have this failing. It does not require 
equal spacing of data points. However, it does require 
that data be monotonic with respect to either mass or 
time. Because raw data may not be monotonic, because 
of machine error, rounding error, sample preparation . . . 
etc., an isotonic regression can and should be performed 
on raw data. The importance of being able to perform 
an isotonic regression without user-input is particularly 
important in this application. The goal of many of these 
algorithms is to provide output independent of any 
operator parameter selection or signal to noise estima- 
tion (Ref [11]). In other words, the isotonic regression 
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treatment of raw data must be extremely robust in order 
to be useful for this application. 

4. Numerical Experiments 

One of the largest potential consumers of these 
algorithms can be found in the field of combinatorial 
chemistry. Indeed, when one produces tens of thousands 
of mass spectra profiles, it is clear that sorting through 
each one of these profiles to identify peak and trough 
structure is simply not possible. As explained before, 
one cannot assume that the data will behave monotoni- 
cally For the purpose of this paper, we simulate this 
effect by taking the first component of ordered pair data 
produced from a MALDI-TOF mass spectrometer. This 
data is then corrupted in a way that introduces a simulat- 
ed numerical error. This is very much like a combinator- 
ial application where, for example, robotic schemes may 
be used to take mass spectral data from samples numer- 
ous times but automatically and rapidly. 

The value y^ is the k\h first component of our 
unequally spaced data set of size n, where n = 50,000. 
The points are normalized so that yj^& [0, 100] for 
1 <k<n. The resulting set Y of n increasing numbers 
was perturbed in various ways to obtain the data sets 
that we subjected to isotonic regression. Each of ten 
strategies for perturbing the original set of numbers was 
rephcated 7?= 100 times, resulting in a total of one 
thousand data sets. 

Let (7=log(2)/1.95996 be fixed. In what follows, 
whenever we perturb a value y^, we do so by replacing 
yf, with yj^ exp(o'z), where z is a standard normal 
deviate. This multiplicative model of measurement 
error was constructed so that approximately 95 % of 
the perturbed values would be at least one half and no 
more than twice the replaced value. 

The following loops describe our perturbation strate- 
gies. In each case, the intent was to perturb P values in 
the form of B blocks of length L. 

For 7? = 1 to 100 repetitions: 

1 . Perturb each of the n values in the original data set 
7 to obtain data set YIOOO.R. 

2. For P=A9n, .25/2, .Oln and L=1,V^. ^: 

(a) Randomly select B = P/L numbers from 
{1, . . . , n/L} without replacement 

Call these numbers ^i, . . . , Sb- 

(b) Let K=n/P. For / = 1, . . . , 5 and j = 0, . . . , 
L - 1, let ^= Ks^+j and perturb each original 
value y^. 



(c) Denote the resulting data set by YOppl.R, 
where pp= \OOP/n and / = 21og (L)/log(P). 

For example, the data set produced on the fourth 
repetition of the case for which P = A9n and 
L = ^ is denoted by Y0491.4. 

Thus, we generated 100 data sets (YIOOO) in which 
all values were perturbed, 300 data sets (Y0491) in 
which 49 % of the values were perturbed, 300 data sets 
(Y0251) in which 25 % of the values were perturbed, 
and 300 data sets (YOOll) in which 1 % of the values 
were perturbed. Furthermore, in each of the cases that 
P= .49/7, .25/7, .01/2 of the values were perturbed, we 
generated 1 00 data sets (YOppO) in which we perturbed 
P isolated values, 100 sets (YOppl) in which we per- 
turbed -7? blocks of ^/p consecutive values, and 100 
data sets (Y0pp2) in which we perturbed one block of 
P consecutive values. 

Each data set was submitted to isotonic regressions on 
a personal computer cluster machine comprised of single 
and multiple processors. These regressions used, respec- 
tively, ^4 = 1,2, 4, 8, 16, 32 of the cluster computer's 
processors. For each regression, the data set was decom- 
posed into A subsets of (approximately) equal size. Each 
subset was simultaneously sent to a separate processor, 
where its isotonic regression was computed using 
Grotzinger's and Witzgall's [4] formulation of the Pool 
Adjacent Violators algorithm. As soon as the isotonic 
regressions of two consecutive subsets were computed, 
the combined result was sent to one of the available 
processors, which then computed the combined isotonic 
regression by means of the device described in Sec. 2. 1 . 
This process was continued until the isotonic regression 
of the entire data set was obtained. The elapsed time from 
job submission to completion was measured by the 
Delta's intrinsic timer. The results are summarized in 
Table 1. 

Table 1. Sample means and standard deviations (y±Sy) of elapsed 
times in milliseconds for five repetitions of ten isotonic regression 
experiments 



Data 




Number of Processors 




Sets 


1 


2 


4 


8 


16 


32 


YIOOO 


2278± 93 


1376±22 


1158±16 


930± 7 


1062±25 


1058±15 


Y0490 


2406±142 


1416± 5 


1182± 4 


938± 8 


1062± 4 


1068± 8 


Y0491 


2376±180 


1436±29 


1208±50 


958±19 


1060±14 


1080±27 


Y0492 


237atl71 


1378± 8 


1152± 8 


922± 4 


1032± 4 


1036±15 


Y0250 


2396± 54 


1386± 9 


1144±11 


944±48 


1040±14 


1040±12 


Y0251 


2298±128 


14iat 7 


1174± 5 


936± 5 


1058± 4 


1052± 4 


Y0252 


233at 95 


1406± 9 


1172± 4 


942± 4 


1066± 9 


1058± 4 


YOOlO 


2232± 30 


1378± 4 


1150± 7 


922± 4 


1034± 5 


1032± 8 


YOOll 


2368±156 


MlOtlO 


1188±24 


94at 


1062± 4 


1062± 4 


Y0012 


2380^=152 


1418± 8 


1184±22 


944± 5 


1068±18 


1070±12 
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Table 1 exhibits several striking features. First, the 
variations in times produced by replications are 
extremely small relative to the magnitudes of the times. 
In retrospect this is not surprising: each data set 
contains a very large number of independent errors, so 
that one should expect that most data sets constructed 
in accordance with a specific perturbation strategy will 
be quite similar. 

Second, there is very little variation in mean timing 
profiles between the perturbation strategies. This 
suggests that the phenomena described below are not 
unique to a particular data structure. 

As anticipated, it is apparent that some degree of 
parallelization decreases the time required to perform 
an isotonic regression. For the data sets that we consid- 
ered, the time required by ^ = 2 processors divided by 
the time required by ^4 = 1 processor ranged from a mini- 
mum of 53.2 % to a maximum of 65.9 %, with a medi- 
an of 60.3 %. The time required by ^ = 4 processors 
divided by the time required by ^ = 1 processor ranged 
from a minimum of 44.9 % to a maximum of 54.8 %, 
with a median of 50.1 %. The time required by ^ = 8 
processors divided by the time required by ^4 = 1 
processor ranged from a minimum of 35.7 % to a max- 
imum of 43.5 %, with a median of 40.5 %. Thus, there 
is compelling evidence that, for n = 50, 000 and these 
types of data sets, using ^ = 8 processors is more effi- 
cient than using ^ = 4, 2, 1 . 

For ^=16, 32, the communication costs of the 
parallelization strategy begin to dominate and the times 
are actually slower than for ^ = 8 processors. This 
phenomenon was also anticipated. With larger data 
sets, we know that we can take advantage of additional 
processors, but the tradeoff between n and the optimal A 
must be empirically determined for the data structures 
and parallel computing system of interest. 

Finally we note that, although the proportional 
improvements in efficiency produced by parallel pro- 
cessing are impressive, the absolute times for serial 
processing are small. At present, it is difficult to forsee 
applications involving isotonic regressions on data sets 
so large that the absolute savings in time will warrant 
parallel computation. Perhaps that day will come; for 
now, our primary interest in parallelizing isotonic 
regression is for the pedagogical value of so doing. In 
our view, isotonic regression is a remarkably simple 
and elegant example of a problem for which mathe- 
matical theory virtually guarantees that parallelization 
will be beneficial. 
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